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Abstract — It is sliown that given two copies of a g-ary input 
channel W, where q is prime, it is possible to create two channels 
W~ and whose symmetric capacities satisfy J(VK ) < 

I{W) < I{W~^), where the inequalities are strict except in trivial 
cases. This leads to a simple proof of channel polarization in the 
g-ary case. 

Index Terms — Channel polarization, polar codes, entropy in- 
equality. 

I. Introduction and Main Result 

Arikan's polar codes 01 are a class of 'symmetric capacity' - 
achieving codes for binary-input channels. Their block error 
probability behaves roughly like 0(2^^^) [2|, where N is 
the blocklength, and they achieve this performance at an 
encoding/decoding complexity of order N log N. 

Polar codes for non-binary input channels were considered 
in O. As in the binary case, their construction is based on 
recursively creating new channels from several copies of the 
original: Let W be a discrete memoryless channel with input 
alphabet X — {0, . . . ,q — 1}. Throughout this note, q will 
be assumed to be a prime number The output alphabet y 
may be arbitrary. We will let /(T4^) G [0, 1] denote the mutual 
information developed across W with uniformly distributed 
inputfl i.e., 

^^^'y^y 9 E.' -W{y I X') 

Let Xi, X2 be independent, uniformly distributed inputs to 
two independent copies of W, and let Yi, Y2 be the corre- 
sponding outputs. Consider the one-to-one mapping Xi,X2 — > 

UuU2 

Ui=Xi+X2 

U2 =X2, 

where denotes modulo-q addition. Observe that Ui and 
U2 are independent and uniformly distributed over X. Define 
the channels 

W-:Ui^ Y1Y2, 
W+:U2'^ Y1Y2U1, 

'All logarithms in this note will be to the base q. 



described through the conditional output probability distribu- 
tions 

W^{yi,y2 \ui) = - W{yi I ui ~U2)W{y2 I W2), 
n ^ — ' 

2/2, "1 I U2) = I ui - U2)W{y2 I U2). 

It follows from the chain rule of mutual information that 
I(W-)+I(W+) = 2I{W). It is also easy to see that W+ is 
better than W, whereas is worse, in the sense that 

I{W-) < liW) < I{W+). (2) 

Since W~ and are also q-ary input channels, the above 
procedure can be applied to each of them, creating the chan- 
nels := (W-)-, W-+ ~ {W-)+, W+- := {W+y, 
and := Repeating this procedure n times, one 
obtains 2" channels, VF^ s e {-,+}", with 'ZsHW^) = 
2'"-I{W). The main observation that leads the author of |[I] to 
construct polar codes is that these channels are polarized in 
the following sense: 

Theorem 1 (f^M)- 

lim i^#{s G {-, +}": I{W^) G (1 - <5, 1]} = I{W), 

lim i^#{s G {-,+}": liW) G [0,<5)} = 1 -/(W^), 
for all S > 0. 

The proofs given in fP\ and f3\ for Theorem [T] are based on 
the following arguments: The symmetric mutual informations 
of the channels created by the above procedure have a 
martingale property, from which it follows that they must 
converge for almost all paths in the construction. This shows 
that both limits in Theorem[T]exist. To prove the claim on these 
limits' values, it would be sufficient to show that ^ holds with 
strict inequalities for all W^, unless I{W^) G {0, 1}. Observe, 
however, that since the output alphabets of channels grow 
as the construction size increases, this approach would require 
the aforementioned inequality to hold uniformly for all q-ary 
input channels. This difficulty is circumvented in [ 1 1 and [3] by 
appropriately defining an auxiliary channel parameter Z{W) 
and proving the convergence of Z{W^) to {0, 1} by the above 
arguments, which then implies the convergence of I{W^) to 
{0,1}. 



The purpose of this note is to provide a proof of Theorem[T] 
that avoids this indirect approach. In order to do so, we will 
need the following theorem. 

Theorem 2. If I{W) e {S,l - S) for some S > 0, then there 
exists an e((5) > such that 

I{W-) + €{5) < I{W) < I{W+) - e{6). 

The dependence of e{5) on the channel W is only through 5, 
and not through particular channel specifications ( e.g., output 
alphabet size). 

Theorem |2] will be proved as a corollary to the following 
lemma, which is the main result reported here. 

Lemma 1. Let Xi,X2 X, Yi,Y2 £ y be random variables 
with joint probability density 



PxiYiX2Y2{xi,yi,X2,y2) 



PxiYi ixi,yi)Px2Y2 {X2,y2) 



(3) 



If 



H{Xi I Yi),H{X2 I Y2) e (5,1-5) 

for some 5 > 0, then there exists an e{5) > such that 

H{Xi+X2 I yi,y2)~max{iJ(Xi | Y^),H{X2 \ Y2)} > e{5). 

We will prove Lemma [T] in Section 

Proof of Theorem |2} It suffices to show that I{W) — 
I{W-) > e{S), as the equality /(VF") + I{W+) = 2I{W) 
will then imply the second half of the claim. Let Xi , X2 G X 
denote two independent and uniformly distributed inputs to 
two copies of W, and let Yi,Y2 G 3^ be the corresponding 
outputs. Since W is memoryless, Xi, X2,Yi,Y2 are jointly 
distributed as in (|3]l. Further, I{W) E {6,1 — 6) implies 

1 - I{W) = H{Xi I Fi) = H{X2 1 Y2) e ((5, 1 - 5). (4) 

It then follows from Lemma [T] that 

I{W) - I{W-) = H{Xi+X2 I FiFa) - H{Xi \ Fi) 

completing the proof. ■ 

II. Proof of Theorem[T] 
Let i?i,i?2,-.. be { — ,+}-valued i.i.d. random variables 



with Pr[Bi = -] = Pr[Bi 
random variables defined as 



i. Let /o,/i, 



be 



/o = I{W) 

/„ = /(M^^i-'^") n=l,2,... 

Note that /„ takes values in [0, 1]. Further, it follows from 
the relation I{W-) + I{W+) ^ 2I(W) that E[/„+i | 
/„, . . . , /o] = In- Hence, the process /q, /i, . . . is a bounded 
martingale, and therefore converges almost surely to a [0, 1]- 
valued random variable loo- Note, on the other hand, that 

Pr[/„ e {5,1-5)] = i^#{s e {-,+}": I{W^) G {5,1-5)]. 



To conclude the proof, it thus suffices to show that Pr[/oo = 
1] = I{W) and Pr[/oo = 0] = 1 - I{W). To that 
end, note that the almost sure convergence of /„ implies 
E[|/„+i - In\]= E[/(W^^i •••■«"+) - I{W^^-^-)] ^ 0. It 
follows from Theorem |2] that the latter convergence implies 
loo G {0, 1} with probability 1. Due to the martingale property 
of /„ we have E[/oo] = E[/o] = I{W), from which it follows 
that Pr[/oo = !] = !- Vv[Iryo = 0] = I{W), completing the 
proof. 

III. Proof of Lemma[T] 

In what follows, II{p) and II{X) will both denote the 
entropy of a random variable X £ X with probability 
distribution p. We will let pi, i £ X denote the probability 
distribution with 

Pi{m) = p(to - i). 

The cyclic convolution of vectors p and r will be denoted by 
(p * r). That is, 

{p*r) ^ '^p{i)r, = ^ r{i)p.i. 

We will also let unif{X) denote the uniform distribution over 
X. We will use the following lemmas in the proof: 

Lemma 2. Let p be a distribution over X. Then, 

1 



\\p-umf{X)\\i > 



-\l-H{p)]. 



qhge 

Remark 1. Lemma\2\partially complements Pinsker's inequal- 
ity by providing a lower bound to the Ci distance between an 
arbitrary probability distribution and the uniform distribution 
by their Kullback-Leibler divergence. 



Proof: 



1 - H{p) 



pj-i) - 1/g 
1/9 , 



< 



q^og e^p{i)\p{i) - l/q\ 



< gloge||p - unif{X)\\i, 

where we used the relation In i < t — 1 in the first inequality. 

■ 

Remark 2. Lemma |2] holds for distributions over arbitrary 
finite sets. That \X\ is a prime number has no bearing on the 
above proof. 

Lemma 3. Let p be a distribution over X. Then, 

1 " H{p) 
- 2q2(q_l)loge- 



for all i,j G X, i =/= j. That is, unless p is the uniform 
distribution, its cyclic shifts will be separated from each other 
in the Ci distance. 



Proof: Let j = i + m for some m 7^ 0. We will show 
that there exists a k ^ X satisfying 



\p{k) — p{k + to)| > 



1 - H{p) 
2g2(5_l)loge' 



which will yield the claim since \\pi ~ Pj\\i — J2kex \pi^) ~ 
p{k + m)|. 

Suppose that H{p) < 1, as the claim is trivial otherwise. 
Let p^^) denote the £th largest element of p, and let S = {i : 
P^^^ > Note that S* is a proper subset of X. We have 

\s\ 



1 



> 



> 



2(9-1) 

1 - H{p) 
2q{q - l)loge' 



p ~ unif{X)\\i 



In the above, the second inequality is obtained by observing 
that p(^) — 1/q is smallest when p*^^^ = ... = ^(9^1), and 
the third inequality follows from Lemma |2] Therefore, there 
exists at least one ^ e 5 such that 



P' 



1 - H{p) 



Given such an ^, let A = {1, . . . , £}. Since q is prime, X can 
be written as 

X — {fc, k + m, k + m + m, . . . , k +m + . . . + m} 

q— 1 times 

for any k € X and m E A:'\{0}. Therefore, since A is a proper 
subset of X, there exists a k E A such that k + m & A'^, 
implying 



p{k) — p{k + m) > 



1 - H{p) 
2q^q-l)loge-- 



which yields the claim. 



Lemma 4. Let p and r be two probability distributions over 
X, with H{p) > T] and H{r) < 1 — for some 77 > 0. Then, 
there exists an 61(77) > such that 

H{p*r) > iJ(r) +ei(?7). 

Proof: Let a denote the distribution with a unit mass on 
i e X. Since H{p) > r/ > H{ei) = 0, it follows from the 
continuity of entropy that 



min \\p - ei\\i > fi{ri) 



(5) 



for some pi{ri) > 0. On the other hand, since H{r) < 1 — 77, 
we have by Lemma |3] that 



\n-rj\\i > 



2g2(q_l)loge 



> 



(6) 



for all pairs i ^ j. Relations (|5]l, (|6]l, and the strict concavity 
of entropy implies the existence of ei(r;) > such that 

H{p*r) = H(^pii)r}j 

> Y,p{i)H{n) + e^{ri) 

i 

= H{r)+€i{'n). 



Proof of Lemma [7} Let Pi and P2 be two random 
probability distributions on X, with 

Pi^ Pxt\Yt(- \yi) whenever Yi = j/i, 
P2 = Px2\Y2{- I 2/2) whenever ^2 2/2- 

It is then easy to see that 

i/(Xi iFi) =E[ff(Pi)], 
H{X2 I Y2)=nH{P2)l 
H{Xi + X2 I 11,^2) = nH{Pi * P2)]. 

Suppose, without loss of generality, that H{Xi | Yi) > 
H{X2 I Y2). It suffices to show that ifE[H{Pi)],E[H{P2)] G 
{S, 1 ~ S) for some 6 > Q, then there exists an e{6) > such 
that E[i?(Pi * P2)] > E[i?(Pi)] + e{S). To that end, define 
the event 

A - {H{Pi) > S/2, H{P2) < 1 - 5/2}. 

Observe that 

5 < E[H{Pi)] 
< (1 - Pr[H{Pi) > 6/2]) ■ 6/2 + Pr[i7(Pi) > 6/2], 

implying Pr[H{Pi) > S/2] > It similarly follows that 
Pt[H{P2) < 1 - S/2] > Note further that H{Pi) 

and H{P2) are independent since Yi and Y2 are. Thus, A 
has probability at least ^2-6)'^ — ■ ^2 ((5). On the other hand. 
Lemma |4] implies that conditioned on A we have 

H{Pi*P2)>H{Pi) + ei{6/2) (7) 

for some ei(5/2) > 0. Thus, 



E[ff(Pi *P2)] 

= Pr[A] • E[ff (Pi * P2) I A] + Pr[A=] • E[i/(Pi * P2) | A^] 
>Pr[A].E[(i/(Pi)+ei(<5/2)) | A] 

+ Pr[A"] •E[iJ(Pi) I A"] 
>E[i/(Pi)]+ei(V2)e2(<5), 

where in the first inequality we used d?) and the relation Hijj* 
r) > H{p). Setting e{6) := ei{6/2)t2{6) yields the result. ■ 



IV. Discussion 

The proof of Theorem |2] does not extend trivially to the 
case of composite input alphabet sizes. In particular, that the 
cyclic group ({0, . . . , g — 1}, +) is generated by each of its 
non-zero elements is crucial to the proof of Lemma [3] On the 
other hand, a weaker statement holds when the input alphabet 
size is composite: Consider replacing the mapping (HJ with 

Ui — Xi + X2, 

where tt is a permutation over X, and define the channels 
W-:Ui Y1Y2 and W+ : U2 Y1Y2U1 accordingly. 
Then, it can be shown that there exists a permutation tt for 
which Theorem|2]holds, irrespective of the input alphabet size. 
The proof of this statement is similar to that of Theorem |2] 
and therefore is omitted. It then follows that channels with 
composite input alphabet sizes can be polarized in the sense 



of Theorem [T] if the mapping in (|8]l is chosen appropriately 
at each step of construction. Whether such channels can be 
polarized by recursive application of a fixed mapping is an 
open question. 
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